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Abstract: In this paper we consider how to automatically create pleasing 
photo collages created by placing a set of images on a limited canvas area. 

The task is formulated as an optimization problem. Differently from exist¬ 
ing state-of-the-art approaches, we here exploit subjective experiments to 
model and learn pleasantness from user preferences. To this end, we design 
an experimental framework for the identification of the criteria that need 
to be taken into account to generate a pleasing photo collage. Five different 
thematic photo datasets are used to create collages using state-of-the-art 
criteria. A first subjective experiment where several subjects evaluated the 
collages, emphasizes that different criteria are involved in the subjective def¬ 
inition of pleasantness. We then identify new global and local criteria and 
design algorithms to quantify them. The relative importance of these cri¬ 
teria are automatically learned by exploiting the user preferences, and new 
collages are generated. To validate our framework, we performed several 
psycho-visual experiments involving different users. The results shows that 
the proposed framework allows to learn a novel computational model which 
effectively encodes an inter-user definition of pleasantness. The learned def¬ 
inition of pleasantness generalizes well to new photo datasets of different 
themes and sizes not used in the learning. Moreover, compared with two 
state of the art approaches, the collages created using our framework are 
preferred by the majority of the users. 

Keywords and phrases: Image Processing and Computer Vision, Math¬ 
ematics of Computing, Optimization and Learning, Artificial Intelligence, 
Image Processing and Computer Vision, Applications. 


1. Introduction 

Photo collages are created by placing a number of photo images on a canvas area 
of limited size. They are used to visually represent in an appealing and compact 
way events of interest. The images can be fitted on the canvas by simply scaling 
them at the risk of losing important details contained in them and making the 
collage dull. In this paper we consider the problem of how to automatically 
create pleasing photo collages: given a set of photo images and a canvas area, 
we want to arrange the photos on the canvas in a pleasant unsupervised manner 
and without scaling them (see Figure 1). Assuming that the size of the canvas 
area is smaller than the sum of the sizes of the photos to be displayed, two 
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Fig 1. Example photo collage created by plaeing a number of photo images on a canvas area 
of limited size. 


main issues arise. The first is that photos may occlude themselves, the second 
is that photos may be partially outside the canvas area. These issues must be 
addressed by taking into account the pleasantness of the resulting collage that is 
influenced by the order with which the photos are placed on the canvas and their 
spatial arrangement. Usually, the most important photos are placed at the top of 
the less important ones in order to minimize the risk of being severely occluded, 
and composition properties related to photo contents, geometric constraints and 
aesthetic consideration are taken into account to maximize the pleasantness of 
the resulting collage. The criteria that define what is important in a photo and 
what composition properties should be satisfied may vary from user to user. 
Moreover the single criteria may compete against each other. To be used in 
an automatic system for photo collage generation, the pleasantness criteria and 
their relative importance must be properly quantified using suitable algorithms. 
At the end of this process a fitness function can be defined whose value represents 
the overall degree of pleasantness of a photo collage. To obtain the most pleasant 
collage, an automatic algorithm must search the best arrangement of the photos 
by maximizing the value of the fitness function. For this purpose an optimization 
algorithm is usually exploited. Several formulations of some of the above criteria 
have been proposed in the literature but none of the existing works performed 
an user study in order to actually determine what are the criteria that made a 
photo important, what constraints must be satisfied in order to have a collage 
balanced, or what hints users pay attention to in judging the pleasantness of 
a photo collage. We argue that if we could elicit the criteria by modeling the 
preferences of the users, we would be able to create more pleasant photo collages. 


1,1, Related Work 

Previous works on photo collage can be categorized into two main groups de¬ 
pending on the processing applied to the photos. These two groups are: content¬ 
preserving and non content-preserving. 

In the non content-preserving group belong the photo collage methods that 
select relevant regions within the photos in order to maximize the information 
that is conveyed in the final collage. The methods ensure that these regions 







User Preferences Modeling and Learning for Pleasing Photo Collage Generation 3 


are made visible in the final collage while the less relevant regions are either 
removed (cropping) or hidden by other, more relevant, ones (hiding). In addition 
to scaling and translation, these methods usually perform a layering of the 
photos to decide the order with which they are positioned on the canvas and/or 
rotate the photos to further preserve their content as much as possible. 

Among the methods that also apply a rotation operation on the photos, we 
find Picture Collage [37, 26] that is the one of the first works that formalized the 
problem of photo collage as an optimization problem using different, competing 
collage criteria, namely image saliency, blank space, and saliency raio balance. 
Inspired by this work is the collage strategy proposed in [2] which uses the same 
criteria but images are firstly classified into three categories and then differ¬ 
ent relevant region detection strategies are adopted on the basis of the image 
category. Also inspired by the work of [37] are the improved collage strategies 
proposed by [38] and [41] where the collages can be also interactively modified by 
the user. A recent photo collage approach [42] uses a heuristic search process to 
ensure that salient information of each photo is displayed in the polygonal area 
resulting from a power-diagram-based circle packing algorithm. Most of the pre¬ 
vious approaches use a saliency map, solely or coupled with other descriptors, as 
informativeness criteria. In [13] instead, the informativeness criteria corresponds 
to foreground objects detected on depth maps. Finally, differently from all the 
aforementioned approaches, the method proposed in [19] creates Arcimboldo- 
like collages with multiple thematically-related cutouts from filtered Internet 
images. 

The stained glass-like photo collage by [15] is one of the methods that pre¬ 
serve the photo orientation without rotating them. The photos are cropped with 
respect to the contained face regions. These cropped regions have straight edges 
that are used to arrange the photos on the canvas. Digital Tapestry [33] subdi¬ 
vides the photo into a set of sub-blocks and from them, the relevant regions of 
the photo are reconstructed and merged together. A pixel-based variant of this 
approach, named AutoCollage, is described in [34]. Here the relevant regions, 
with variable shapes are merged with a seamless blending that ensures that no 
sharp boundaries between them are formed in the final collage. A similar ap¬ 
proach is the Mobile Photo Collage presented in [25]. The Puzzle-Like collage 
[16] instead, cuts out from each photo an irregular shaped region which follows 
the area surrounding a relevant object within the image. Finally, we can cite 
the Dynamic Media Assemblage [27], a photo collage approach that can be used 
to summarize video content as well as a photo collection in a stained glass-like 
collage. 

In the content-preserving group belong those methods that arrange the pho¬ 
tos according to the relevance of their content defined in some way. The only 
operations performed on them are scaling and translation. Usually the most 
relevant photos are scaled bigger than the less relevant ones, and they are posi¬ 
tioned on the most salient regions of the canvas. Moreover the aspect ratio of the 
photos is preserved. These methods are also referred as photo layout methods. 

An example is the work of [7] where the photo layout is constructed using 
a larger topic photo and several small-size supportive photos. The photos are 
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Fig 2. Experimental framework. 


selected and sized according to their temporal and content coherence. A similar 
approach is exploited in [5] on video sequences where key-frames in a visual 
summary are arranged on the canvas using the narrative grammar of comics. 
Also within this group we can cite the work of [6] where exclusion zones are 
used to layout a set of photos on a canvas using different spatial criteria. This 
method was further improved in [14]. In [35] spatial criteria are coupled with 
aesthetic principles to layout photos in a pleasant composition. Recently, taking 
advantage of information usually found in social networking, and building on 
the previous Pic Wall work [39], FriendWall ([40]) uses social attributes (intrinsic 
labels) to create photo collage employing both image visual features and asso¬ 
ciated Metadata. As a final example, we can cite the interactive approach [10] 
where pre-designed layout templates of annotated cells are used to arrange the 
photos according to their metadata, and focus area can be selected by the user. 

1.2. Paper Contribution and Organization 

The focus of this work is to exploit subjective experiments to model user pref¬ 
erences in order to learn what criteria (and to what extent) need to be taken 
into account to automatically generate a pleasing photo collage. To this end we 
designed an experimental framework that incorporates the identification of the 
criteria via user preference modeling, the implementation of the correspond¬ 
ing computational algorithms, the learning of their relative importance, and 
the validation of the results. We applied our framework in the context of non 
content-preserving collages. We believe that this category permits to investigate 
more criteria underlying the definition of pleasantness as the associated prob¬ 
lem has more degrees of freedom than the one associated to content-preserving 
collages. However, our proposed framework can be adapted to these methods as 
well. The different steps of our framework are depicted in Figure 2. A first sub¬ 
jective experiment is conducted to investigate how different criteria are involved 
in the user subjective definition of pleasantness. For this experiment, we redefine 
the three basic criteria (image informativeness, canvas area coverage, and infor¬ 
mation ratio balance) exploited in most of the works in the state-of-the-art (e.g. 
[37, 2]). We evaluate three different representations of image informativeness: 
the first one, which is usually used in the state-of-the-art, is based on saliency; 
the other two are based on quality and color harmony respectively, and are here 
introduced. Collages are created by exploiting a Direct Search optimization al- 





















User Preferences Modeling and Learning for Pleasing Photo Collage Generation 5 


gorithm. Since user image collections are of very different contents, and different 
contents may lead to different pleasantness criteria, we considered five thematic 
image datasets. The results obtained from this experiment are used to identify 
new criteria both at global and local level. The new global criteria are: face ratio, 
axis alignment, centrality, and convexity; the new local criteria are: color sim¬ 
ilarity, orientation diversity and minimum orientation difference. After having 
developed algorithms to compute these new criteria, their relative importance 
is learned by exploiting user rankings on the previously created collages. The 
identified criteria and their learned importance are then used to generate new 
sets of collages that are evaluated by a new panel of users. To further validate 
the proposed framework, we performed three additional experiments. In order 
to verify if the identified criteria and their learned relative importance general¬ 
ize well, that is, if they can be used to create collage on unseen image sets, we 
performed a subjective experiments on six other image collections of different 
contents with respect to the ones used in the previous experiments. We also 
tested the generalizability of the learned definition of pleasantness by creating 
collages varying the number of images in the set and the canvas size. More¬ 
over, we compared the performance of our proposal against two state-of-the-art 
algorithms. To the best of our knowledge this is the first work which exten¬ 
sively exploits subjective experiments within the collage generation process to 
learn user preferences, and that uses datasets of images of different contents to 
validate the proposed approach. 

The rest of the paper is organized as follows. The problem formulation is 
mathematically described in Section 2 along with the description of the ba¬ 
sic criteria. Section 3 illustrates the collage generation by describing the three 
different importance maps considered in our experiments, the photo datasets 
used, and the optimization algorithm responsible for the collage creation. The 
first subjective experiment and its outcomes are described in Section 4. The set 
of the new criteria derived from the first experiment is described in Section 5, 
while the user preferences modeling and learning strategy is detailed in Section 
6. Results of the second subjective experiment performed on the newly created 
collages are illustrated in Section 7. The generalizability of the learned defini¬ 
tion of pleasantness and the comparison with state of the art methods on new 
datasets are reported in Section 8. Finally Section 9, concludes the paper. 


2. Problem Formulation and Basic Criteria Definition 

Given N input photo images I = and their corresponding importance 

maps M = (importance map representations will be discussed in the 

next section), a photo collage algorithm must arrange all the images on a canvas 
area C. In a photo collage, each image li is characterized by its state si = 
(ti,6>^,/i), where ti = {ti^x^U,y) is the 2D translation vector (w.r.t. the canvas 
origin), Oi is the orientation angle (w.r.t. the x-axis), and li is the layering index 
used to determine the placement order of the image. The state is used in a roto- 
translation transformation T(',Si) to position the image (and its importance 
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Fig 3. Photo collage layering and eompositing 


Basic criteria used in most photo collage formulations 


Criterion 

Description 

Function 

Visibility 

Visible image content (based on importance map) 

Cl 

Canvas coverage 

Canvas area covered by the photos 

C2 

Visibility ratio balance 

Visible image region w.r.t. image size 

Cs 


map) on the canvas area: 

X, =T(/,,Si) Mi=T{Mi,s,) (1) 

The layering indexes can be manually or automatically assigned according 
to some heuristics. We compute the layering indexes li on the basis of the 2D 
integrals of the importance maps M^: images with higher importance maps are 
placed on top layers, while images with lower importance maps are placed on 
bottom layers. An example of the procedure used for photo collage layering and 
compositing is reported in Figure 3. 

The picture collage creation is formulated as an optimization problem in 
order to find the best configuration of states S = which optimizes all 

the criteria considered. 

2,1. Basic Criteria Definition 

Most of the existing photo collage methods (e.g. [37]) exploit the three '‘’‘ba¬ 
sic criteria’’^ listed in Table 2.1. These criteria are quantified by the functions 
(7i(S;X, Al, C). The functions are parametrized by the configuration of states 
S, and take as data the set of transformed images X, the set of transformed 
importance maps Al, and the canvas C. In the following we write the functions 
(7i(S;X, Al, C) 8iS Ci{') dropping the dependencies for a more compact notation. 

Visibility The overall collage visibility is the average of all information ratios 
(based on an importance map) computed on the visible regions of the images: 
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Ci{-) 


1 sum2{vis{Mi)) 
N ^ sum2{Mi) 


( 2 ) 


where vis{') is a function that computes the visible parts (taking into account 
clipping and overlapping) of the given map, and sum2{-) is a function that 
computes the 2D integrals of the map. 


Canvas coverage The canvas coverage is defined as the ratio of canvas area 
covered by the arranged photos: 


1 ^ 

C'2(-) =- T^'Parea{vis{Mi)) 

areaiC) 

^ ' i=l 


( 3 ) 


where area{') is a function that computes the area corresponding to the given 
input. 


Visibility ratio balance The visibility ratio balance is computed as the stan¬ 
dard deviation of the information ratios: 


C3(-) = 1 


std 

i=l...N 


{ 


sum2{vis{M.i)) 

sum2{Mi) 


} 


where std{'} computes the standard deviation of the given values. 


( 4 ) 


The values obtained are combined into a fitness function / that must be 
maximized: 


3 

i=l 

with Ai,i = l,...,3,a weight used to define the contribution of the i-th criterion 
(usually fixed to 1). This fitness function is at the basis of most of the photo 
collage algorithms in the state-of-the-art. 


3. Collage Generation 

In the following subsections, assuming that a proper dataset of images is avail¬ 
able, we describe three different approaches to compute the image importance 
map: the first approach is inspired by [28]; the other two are here introduced. 
We also describe the algorithm used to place the images on the canvas area by 
searching the best configuration of states. The algorithm optimizes the fitness 
function defined in Equation 5. 
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Fig 4. The five thematie datasets. From top to bottom: Burst of Color III (Burst),Fa,sh[on 
II (Fashion), Landscape V (Landscape), Self Portrait VII (Self), and Zen Photography III 
(Zen). 


3.1, Photo Datasets 

A collage is usually created from a set of images sharing a common underlying 
theme. To create our dataset, we downloaded the images from the DPChallenge^ 
web site. The site collects photos of both amateur and professional photogra¬ 
phers that participate to digital photography challenges. Each challenge has a 
main theme that the participants must follow. All the submitted photos are then 
judged by other participants by giving a numerical score. We selected five photo 
challenges among the hundreds published and for each of them we collected the 
14 best rated photos. The challenges have been chosen to include diverse sub¬ 
jects of generic themes. The chosen challenges are: Burst of Color III (Burst 
for brevity), Fashion II (Fashion), Landscape V (Landscape), Self Portrait VII 
(Self), and Zen Photography III (Zen). The Burst dataset is composed of im¬ 
ages with a single subject; the Fashion dataset contains images of people and 
accessories; the Landscape dataset is composed of mostly horizontal images; on 
the contrary, the Self dataset contains mostly portrait images both in colors and 
black and white; finally, the Zen dataset is composed of heterogeneous images 
and in most cases it is not easy to identify the subject. This diversity makes 
it possible to investigate if people use different criteria in the creation of photo 
collages for different themes. Figure 4 shows the five sets of photos. 


3.2. Importance Maps 

Two of the basic criteria used in Equation 5 require the computation of impor¬ 
tance maps to locate the most informative regions in an image. The underlying 
idea is that the most informative regions should not be hidden by other images 
thus maximizing the information displayed. Since there is no a unique definition 


^http://www.dpchallenge.com/ 
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of what is important in an image, in our investigation we tested different impor¬ 
tance maps exploiting three different image properties: saliency, color harmony, 
and quality. Each importance map is plugged in turn into Equation 5 obtaining 
three different collages for each photo dataset. 

Saliency The first importance map is based on saliency and uses an approach 
to compute it similar the one presented in [28]. We used this approach in a 
previous work on image thumbnailing [9] and the resulting saliency maps show 
that, on the overall, a compact set of salient regions are produced. We consid¬ 
ered these results reasonable for our purposes. Other, more recent and precise 
saliency methods can be exploited. The recent paper [8] shows the performances 
of several algorithms on reference datasets that can be used as alternative ones. 
Eor surveys related to saliency see [11, 23, 1]. To compute the saliency map, the 
image is divided into small rectangular tiles. On each tile, a contrast score is 
computed by comparing its average color with the average colors of the neigh¬ 
bor’s tiles. The contrast score is assigned to each pixel in the tile. The basic 
algorithm has been extended by computing three different saliency maps in the 
LUV color space using neighborhoods of increasing size. Each map captures the 
saliency at a different scale. These saliency maps are then filtered and combined 
together into a single normalized map of values in the range [0,1]. We denote the 
importance map of the i-th image computed using saliency as Mi^saU Examples 
of saliency maps are shown in the second column of Eigure 5. 

Harmony Since color combinations are related to the pleasantness of an image, 
for the second importance map, we used the method proposed in [36] to evaluate 
color harmony of the image locally by creating a color harmony map. We choose 
to use this approach because, in contrast to other approaches (e.g. [32]), it 
computes an image color harmony score by considering the distribution and 
spatial relationship between color regions found by the MeanShift segmentation 
algorithm. In order to have a color harmony map we computed the harmony 
score on pixel’s neighborhoods (i.e. pixels in a square region surrounding a given 
pixel’s location) of different sizes. The harmony map is obtained by summing all 
the scores and by normalizing them in the [0,1] range. We denote the importance 
map of the i-th image computed using color harmony as Mi^har- The third 
column of Eigure 5 shows some examples of color harmony maps. 

Quality Image quality approaches model how an image is perceived if affected 
by different image distortions. We cannot predict what kind of image distortions 
are present, nor we have a reference image to which compare our photos, thus we 
must consider generic (or “universal”) no references image quality approaches. 
We exploited the BRISQUE (Blind/Referenceless Image Spatial Quality Eval¬ 
uator) computational model described in [30]. The model uses different image 
features in order to quantify the image quality. Since BRISQUE computes a sin¬ 
gle quality index for an image, we implemented a neighborhood-based strategy 
in order to obtain a quality map. We considered the quality index computed on 
the whole image and on three pixel’s neighborhoods. The indexes are summed 
and normalized in the [0,1] range. We denote the importance map of the i-th 
image computed using image quality as Mi^qua- The fourth column of Eigure 5 
shows some examples of image quality maps. 
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Fig 5. Examples of importance maps. 


3,3, Optimization Algorithm 

Let us consider, for now, a generic photo dataset and a generic importance map 
definition. Under these assumptions, the optimal collage is generated by finding 
the best configuration of states S which maximizes Equation 5. The solution 
space of this maximization problem is of mixed type: in fact for each state 
Si = (t^, Oi^li) we have G G M, and li G N. In order to uniform the state 

variables types, and since small variations of Oi do not affect the final collage, 
the allowed orientations are uniformly quantized in the range [—Omax^^max]- 

The chosen optimization method is an extension of a Direct Search algorithm 
(DS) modified to deal with discrete solution spaces [4, 3]. DS is a derivative-free 
method for solving optimization problems [18, 24]. Since the focus of this paper 
is not on the optimization algorithm used, any non-gradient method could be 
used [31] as well as stochastic ones [17, 12]. 

The algorithm is initialized with a random configuration of states: the i-th 
image is placed on the canvas at a random position and with a random 
orientation Oi. Its layering index instead is determined by the importance map 
as previously described. At each iteration of the implemented DS algorithm, 
the algorithm finds the best configuration of states by testing the current best 
configuration against all those obtained by varying the position and orientation 
in each image’s state. The position of each image is then updated and a new 
iteration is started. The algorithm terminates when the maximum number of 
iterations has been reached. 

We used the modified Direct Search algorithm without any heuristic whose 
computational cost is 0{gan) with g being the number of grid points on the 
canvas, a being the number of allowed orientation on ecah grid point, and n 
being the number of images to be placed. Other, more efficient, optimization 
algorithms can be used. Here we are interested in the effects of using different 
criteria in creating the photo collages, not the most efficient way to create them. 
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3,4- Experimental Setup 

In our implementation the size of the canvas C is set 400x400 pixels and all the 
images have been resized such that min{width, height} = 128 pixels maintaining 
the same aspect ratio. With these constraints, the ratios between the sum of 
the areas of the images in a given dataset and the area of the canvas are 2.0096 
for the Burst dataset, 2.0024 for the Fashion dataset, 2.2008 for the Landscape 
dataset, 1.8168 for the Self dataset, and 2.0144 for the Zen dataset. In practice 
this means that we need to hide about 50% of the pixels of the images to fit 
them on the canvas in a pleasant manner. Or, conversely, we need to retain the 
most informative and pleasant 50% of the pixels. The canvas and image dimen¬ 
sions have been chosen solely for the purpose of evaluating the performance of 
our framework under typical constraints usage. We are here interested more in 
the ratio of size between images and canvas than in the absolute dimensions 
themselves and we wanted the placement problem to be hard. A larger canvas 
and/or larger images can be used in an actual application. It should be pointed 
out that while optimization has been done on canvas of 400x400 pixels, the 
subjective tests have been done on their 1600x1600 versions. 

Top left image corners were allowed to be placed on a regular grid from —2g 
to 400 in both canvas directions, with a step of ^ = 50 pixels. The set of allowed 
orientations is defined in the range [—Omax^^max] = [~f ? f] steps. For 

Experiment I, the values of Ai, A2, and A3 in Equation 5 have been set to 1, 
while for Experiment II, these have been learned from the users. 


4. Subjective Experiment I 

The above algorithm has been applied to each dataset using the three im¬ 
portance maps yielding a total of 15 photo collages as shown in Eigure 6. 
Let us denote each collage with the corresponding configuration of states S^: 
ds G {Burst, Fashion, Landscape, Self, Zen} denotes the photo dataset, and 
m G {sal, qua, har} the importance map used. 

In order to identify the criteria to be used to create pleasing photo collages, we 
performed a subjective test involving several users. Test subjects were selected 
taking into account age, gender and expertise in photography. Specifically, 16 
subjects (Italian native speakers) were enlisted. Subjects are between 21 and 
41 years old, three females and 13 males. Only one of the subjects can be con¬ 
sidered an expert photographer (although not professional) while the others 
consider themselves amateurs. Half of the subjects stated that they shoot an 
average of 3,000-4,000 photos/year. The remaining subjects shoot an average of 
100-300 photos/year. All of them have a certain knowledge about digital image 
processing. No relation exists between subjects and images in the photo datasets. 
In this first experiment, we showed to each subject the five sets of three photo 
collages, one set at time, and asked him/her to rank the three collages according 
to his/her liking without judging the semantic of the scenes depicted. The sub¬ 
jects were aware that the collages have been generated by different algorithms 
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Burst Fashion Landscape Self Zen 



CsaZ 
^ Burst 

osaZ 

'^Fashion 

Csal 

'^Landscape 

Csal 

^Self 

csal 

^Zen 

Qpiar 
^ Burst 

char 

^Fashion 

char 

^Landscape 

char 

^Self 

char 

^Zen 

^qua 
^ Burst 

^qua 

^Fashion 

^qua 

^Landscape 

^qua 

^Self 

^qua 

^Zen 


Fig 6. Photo collages created on the five datasets using the three importance maps. Color and 
full size images ean be found at http: //www. ivl. disco, unimih. it/research/ collage/ . 


but no technical information and no hints about the underlying criteria were 
given. This was done in order to not bias their choices. The sets of photo col¬ 
lages, as well as the collages within each set, were presented in a random order. 
The evaluation of all the collages and the related interviews took on average 30 
minutes per subject. 

After all the test sessions have been performed, we counted the number of 
times that each collage was ranked at the first (i.e. best), second, or third posi¬ 
tion in its photo dataset. During the counting, we checked for noisy user feedback 
that, in the pairwise experiment, manifests in the form of circular preferences 
(e.g. A>B, B>C, and C>A). We planned to remove these subjects from the 
analysis, but at the end of the experiment no one of the subjects showed this 
behavior. 

Table 4 shows the detailed results. As it can be seen, in the case of the 
Zen photo dataset, the results are quite polarized. Almost all the subjects have 
judged the collages in a similar manner ranking first the collage created with the 
Saliency map, then the one using the Harmony map, and lastly the collage using 
the Quality map. The same ranking, although with a less polarization effect, can 
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Number of times that each collage was ranked fist, second or third 


Burst 


1st 2nd 3rd 


Fashion 


1st 2nd 3rd 


Landscape 1st 2nd 3rd 


Saliency 9 3 4 

Harmony 5 6 5 

Quality 2 7 7 


Saliency 9 4 

Harmony 6 8 

Quality 1 4 


3 Saliency 9 

2 Harmony 6 

11 Quality 1 


Self 

1st 

2nd 

3rd 

Saliency 

2 

5 

9 

Harmony 

5 

7 

4 

Quality 

9 

4 

3 


Zen 

1st 

2nd 

3rd 

Saliency 

11 

3 

2 

Harmony 

2 

11 

3 

Quality 

3 

2 

11 


Importance maps ranking, according to Table 4. The numbers in parenthesis are the scores 
computed using the Formula One World Championship points scoring system 


Set 

1st 

2nd 

3rd 

Burst 

Saliency (339) 

Harmony (309) 

Quality (281) 

Fashion 

Saliency (342) 

Harmony (324) 

Quality (262) 

Landscape 

Saliency (348) 

Harmony (314) 

Quality (265) 

Self 

Quality (342) 

Harmony (324) 

Saliency (225) 

Zen 

Saliency (359) 

Harmony (293) 

Quality (276) 


be observed for the Burst, Fashion and Landscape sets. In all the three sets, the 
collages created with the Saliency map is clearly the preferred one. The one 
using the Harmony map is the second best since it has been selected second or 
third a fewer number of times than the one using the Quality map. The only set 
displaying a different ranking is Self. In this case, the ranking is the opposite of 
the ones obtained from the other four sets. 

In Table 4 the final ranking of the importance maps for the five photo collage 
sets are reported. The ranking is determined by applying the Formula One World 
Championship points scoring system: each collage receives 25, 18, or 15 points 
each time that it is selected respectively first, second, or third. The numbers in 
parenthesis are the computed scores. 

After each test, we also interviewed each subject about the reasons of his/her 
choices, what factors have influenced the selection of a photo collage over the 
others, and what criteria they used. In the following, for each photo dataset, we 
report a summary of the answers given by the users during the interviews. 

4-1- Experiment I: Results 

The Burst photo dataset is composed of images with bright colors. Many of 
these images are close-ups. It is not surprising that most subjects indicated 
color as a primary feature in collage evaluation. In particular, several subjects 
suggested that the images should have been positioned in the canvas by taking 
into account the color similarity. Very dissimilar colors among neighbor images 
were considered disturbing. One subject suggested to hide very dark regions 
preferring to have a collage with bright colors. Most subjects preferred images 
placed with randomized orientations. Collages containing images with their bor- 
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ders parallel to the canvas borders were penalized. Most of the images in this 
dataset contain a single object of interest. Collages where this object was fully 
visible were thus preferred, in particular in the case of faces. 

The Fashion photo dataset is mainly composed of images of full-body women 
models. Only one image is a close-up. These images are less colorful than the 
Burst dataset but they contain high contrast regions. The main criterion used 
in evaluating the collages was the visibility of the models. Several subjects also 
indicated that having the top layer image in a central position makes the col¬ 
lage more pleasing. Secondary criteria include the visibility of a (impossible to 
model) favourite image, and loss of bright colored regions. No other criteria were 
suggested on this dataset. 

The Landscape photo dataset contains images with mostly dull colors if 
compared against the previous ones. No people are visible and the scenes de¬ 
picted are mostly natural scenes. Several shots have a panoramic aspect ratio. 
For these reasons, according to the subjects, the collages created on this dataset 
resulted among the most difficult to be evaluated. Images arranged in a regular 
way were considered disturbing. If an image was mostly covered by the others 
(as for example the violet sunset in the collage created with the Saliency Map), 
it was considered acceptable by many users. On the overall, the collages were 
often considered equivalent. 

The Self photo dataset was the easiest to evaluate probably because contains 
self portraits. As expected, the criteria arisen from the interviews referred mostly 
to the visibility of the faces. One interesting insight on this dataset is that, even 
though we encouraged the subjects to avoid judging the image content from 
the semantic point of view, many choices were made based on the appealing 
of the faces depicted. For example, some subjects considered the photo of the 
clown unpleasant and thus a photo that could be covered before others. On the 
contrary others considered this photo very artistic. It seems that when human 
are depicted, personal preferences are difficult to ignore. This is the only dataset 
containing both gray-scale and color type of images. Some test subjects did not 
appreciate collages with spatial clusters of images of the same type. 

The Zen photo dataset should inspire peace and tranquillity. It contains 
photos with very few colors and details. They are mostly close-ups, and some 
of the photos show soft-focus effects. Most of the subjects found it difficult to 
judge the collages and express the rationale behind their choices. However, color 
composition and harmony were the most important criteria. The best collages 
were those where the relevant objects were visible. One interesting criterion 
emerged on this dataset is that the shape of the visible image regions should 
not be jagged. Regular (i.e. convex) shapes are considered more appealing. 

5. New Criteria Definition 

From the results reported in the previous section, we can see that the users 
evaluated the collages using different criteria. These criteria are both local and 
global. Local criteria refer to either properties of single images or of their neigh¬ 
borhoods, while global criteria refer to properties of the collage seen as a whole. 
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New criteria for a pleasing photo collage 


Criterion 

Description 

Function 

Visibility 

Visible image content (based on the three importance 
maps) 


Canvas coverage 

Canvas area covered by the photos 


Visibility ratio balance 

Visible image region w.r.t. image size 

C'3 

Face Ratio 

Percentage of visible faces 

C' 

Axis Alignment 

Percentage of images having sides parallels to the canvas 

Ck 

Centrality 

Position of the top-level image 

C'e 

Convexity 

Measure of jaggedness of the shape of the visible regions 

C'r 

Color Similarity 

Measure of color similarity between neighbor images 

^8 

Orientation Diversity 

Measure of variability in the image orientations 

ck 

Minimum Orientation Differ¬ 
ence 

Minimum orientation difference between neighbor photos 

Cio 


The three basic criteria reported in Table 2.1 and exploited in previour works 
are not enough to capture the different nuances of pleasantness expressed by the 
users. Thus, on the basis of the insights obtained from Experiment I, and taking 
into account that we need to model them with computational algorithms, we 
have selected the criteria in Table 5 to be used in the generation of pleasing col¬ 
lages. The first three criteria are extensions of the ones in Table 2.1, where now 
the importance map is computed by using a combination of the three impor¬ 
tance maps described in Section 3.2. The other seven criteria have been defined 
following the results of Experiment I. Since the results showed that we also need 
to take into account the presence of faces within the images, it is necessary to 
consider, for each image, a binary mask Fi containing the face regions. These 
masks undergo the same geometric transformations as the importance maps: 

J’, = r(Fj,Sj) (6) 

We indicate with ^ the set of transformed masks which is passed 

along with the other data to the criteria functions. In the following we write 
the functions (7^(8; X, Al', C) as Ci{') dropping the dependencies for a more 
compact notation. 

Visibility Eor each image we combined the three importance maps computed 
on saliency, quality and harmony, in order to obtain a global importance map: 

M' = ^ akMi^k (7) 

kE{sal,qua,har} 

where ak are found as described in the next section. Visibility is thus com¬ 
puted as in Equation 2 by substituting the set of transformed importance maps 
A1 with the new one 



sum2{vis{Ai' i)) 


(8) 


sum2{M-) 
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Canvas coverage The definition of the canvas coverage is identical to the 
definition of C 2 in Equation 3: 


C^2(-) 


1 

area{C) 


N 

area{vis{Ai' i)) 

i=l 


(9) 


Visibility ratio balance The ratio balance is computed as in Equation 4: 


C'(.) = l- std 

i=l...N 


j sum2{vis{M'i)) 1 
\ sum2{M[) J 


( 10 ) 


Face ratio A face detector is run on each image li to find the mask containing 
the face regions (i.e. face bounding boxes). Let the mask Fi be 


J 1 if (x,^) G face region 
I 0 otherwise 


( 11 ) 


the face ratio feature is then defined as follows: 

sum2{vis{Ti)) 


( 12 ) 


Axis alignment This feature measures the ratio of images with orientation 
parallel to the axis, i.e. 0 given that Omax = f • 


G S : — 0} 


(13) 


Centrality The centrality feature measures how central is the image in the first 
layer, i.e. the top-most image. Let us call Ci = the centroid of the visible 

part of the image in the top layer and Cq = the centroid of the canvas 

C. The centrality is defined as: 


l|ci -C0II2 

hdiag(C) 


(14) 


where hdiag{') is used to compute the half diagonal length. 


Convexity Eor each transformed image Xi the convexity ratio is defined as ratio 
between the area corresponding to the image’s visible region and the area of its 
convex hull. The convexity feature is computed as the minimum convexity ratio 
over all the transformed images: 

c;(.)= ml„ f ' I 

[area[convex[vts(li))) j 
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Color similarity This feature is computed by evaluating the color histogram 
similarity of each image on the canvas with respect to its neighbors. For each 
image we first compute: 


di = x‘^{hist{vis{Xi))^hist{vis{Xj))) (16) 

j£neigh(Xi) 

where hist{vis{Xi)) is the color histogram computed on the visible portion of 
is the chi-squared distance, and neigh{Xi) represents the set of the indexes 
of the images neighbors of Xi. Color similarity is then computed as: 


N ^ #{neigh{Ii)} 


(17) 


Orientation diversity This feature measures the average of the variance in 
orientation in each set of neighbor images: 

^ T I ( 18 ) 

N ^je-Nl\emax 

1=1 

where NI = neigh{Xi) U {i} and Omax is the maximum rotation angle allowed. 


Minimum orientation difference This feature measures the average of the 
minimum orientation differences between each image Xi and its neighboring set 
neigh{Xi): 


CU) 


1 • f \^i ~ ^j \ 1 

^ f^jeneigh{Xi) [ Umax ) 


(19) 


The new fitness function /'(•) to be optimized in the generation of pleasing 
photo collages, can be compactly written as: 


10 

/'(S; X, M', = E C) (20) 

where each A- weights the contribution of criterion C-, and are found as 
described in the next section. Please recall that the fitness function also depends 
on the three weights ak introduced in Equation 7, and that are used to compute 
the new importance maps. 


6. User Preferences Modeling and Learning 

Given as input the values C', i = 1,..., 10, we want to learn a single set of 
optimal weights [A', ex] = {ai}i(z{aai,qua,har}\ to be plugged into Equa- 

tion 20 so that they produce fitness values in accordance with user preferences 
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Normalized scores obtained by scaling the scores in Table 4 for the maximum score in each 


Set 

Saliency 

Harmony 

Quality 

Burst 

1.00 

0.91 

0.83 

dataset Fashion 

1.00 

0.95 

0.77 

Landscape 

1.00 

0.91 

0.76 

Self 

0.80 

0.91 

1.00 

Zen 

1.00 

0.82 

0.77 


emerged from Experiment I on all the datasets considered. To this end, for each 
dataset, the fitness values obtained for the collages created using the saliency, 
harmony, and quality importance maps must be in the same order reported 
in Table 4. Taking as example the Burst dataset, where the user preferences 
were Saliency Harmony Quality, we want that f^S^sirst) ^ f'i^^urst) ^ 
f'i^^Burst)' Furthermore, the relative distances between the normalized scores 
obtained by the different maps and reported in Table 6 should be preserved as 
much as possible. As an example, let us indicate with sc(-) the function that 
computes the Formula One score. Taking again as example the Burst dataset, 
we want that the fitness /'(•) satisfies 

fffClhar \ Q^fchar \ 

J y^Burst) ^ ^^y^Burst) =0 0]^ 

-p/(Csal \ Q^fCsal \ 

J y^Burst) ^^y^Burst) 

and 

J y^Burst) ^ ^^y^Burst) = Q gQ 

-Pffcsal \ c^fGsal \ 

J y^Burst) ^^y^Burst) 

Similar constraints come from the other four datasets considered, giving a 
total of ten simoultaneous contraints that Equation 20 has to satisfy. 

The optimal weights [A', a] are found by solving the following optimization 
problem: 


( 21 ) 


[A', a] = arg max 

^1 V Aio 

sal i^qua lOi-har 


y] T(ordfyord£)-7? Y. 

dseV k={2,3} 



scjSZn 
sciS2) , 

(23) 


where V = {Bursty Fashion, Landscape^ Self ^ Zen}^ 

spectively the per-dataset user rankings computed using sc(') and the rankings 
induced by /'(•): 


^^^ds = [mi,m 2 ,m 3 ],mk G {sal,qua,har} : se{S}ff) > se{S}ff) > se{S}ff) 

(24) 


ordi = b'i,i2, J3],ife e {sal, qua, har} : /'(S^(^) > /'(S^"J > /'(S^"J (25) 
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Signs of the criteria weights learned by the optimization algorithm, and a description of 
their effects on the creation of the collage 

Criterion Sign Interpretation 


c'2 

C'z 

C' 

C'e 

C'r 

C's 


Promotes the visibility of the image informativeness 
Promotes the covering of the whole canvas, demoting holes 
Demotes large variations in the size of the visible parts of the images 
Promotes faces to be visible 

Promotes images to be placed aligned with canvas axis 
Promotes image in the top layer to be placed in the center 
Promotes visible parts of the images to be convex 
Promotes proximity of images with similar color histograms 
Promotes small variation of orientations among neighboring images 
Demotes neighboring images to have the same orientation 


t(-, •) is the Kendall tan rank correlation coefficient [21], || • ||i is the 1—norm, 
and 7 ^ is a weight term that balances the relative contributions of the two parts 
of which Equation 23 is made of. In this work, 77 is heuristically set to 1. 

The rationale behind the optimization is that we want to automatically find 
the best set of weights [A', a] that, plugged into Equation 23, produce a fitness 
function /'(•) in maximum accordance with user rankings on all the datasets 
used for training. 

The Kendall tan rank correlation coefficient in the first term is used to mea¬ 
sure if, and to what extent, a given set of weights [A', a] produces a fitness /'(•) 
in accordance with user rankings. This means that when the fitness function 
/'(•) is valued on the set of collages, its outputs should be in the same order 
in which the users judged them. The second term is introduced to avoids the 
scores to be too close to each other and to have a meaningful ranking. 

Having such a fitness function permits, given a new set of images for which 
we want to build a collage, to have a measure of how good is a certain configu¬ 
ration of image states. Eurthermore, maximizing /'(•) we are confident that we 
are generating a collage that the users will judge good on the overall. The opti¬ 
mization to solve Equation 23 has to be performed just once and offline so the 
computational time required to solve it is of secondary importance. However, 
it requires a bunch of seconds to run, since all the inputs are computed offline 
and the only operations involved are the computation of the Kendall tan rank 
(first term of Equation 23) and the relative distances between user scores and 
induced scores (second term of Equation 23). 

The optimization problem in Equation 23 is solved using the continuous- 
space implementation of the DS algorithm. The signs of the criteria weights 
are reported in Table 6 together with a brief explanation of their effect in the 
creation of the collage. 

Once the weights [A', a] are learned, a new set of collages is generated by 
maximizing Equation 20. Eor each dataset, the layering order of the images is 
induced by the weigths ol = [asapOLqua^OLhar]- More in details, for each image 
i a new importance map M- is created using Equation 7. The layering order is 
then obtained by sorting in decreasing order the 2D integrals of the importance 
maps M-. The collages are generated using the discrete version of DS algorithm 
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^ Burst 

Csal 

'^Fashion 
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^Zen 
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S' 

^Self 
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^Zen 


Fig 7. Collages comparison between the best ranked collages in Experiment I (top) and the 
collages created with the user preference modeling and learning procedure for Experiment II 
(bottom). Color and full size images can be found at http: //www. ivl. disco, unimib. it/ 
research/ collage/. 


introduced in Section 3.3. 

7. Subjective Experiment II 

The final collages obtained on each dataset using the above described procedure, 
are reported in Figure 7. We denote each new collage with the corresponding 
configuration of states . For each dataset we also report the best ranked col¬ 
lage from Experiment I according to the scores in Table 6. In this experiment we 
wanted to understand if the new collages were judged better than the previous 
ones. To this end, we performed a pairwise subjective test. For each dataset, 
users were presented with the two collages in Figure 7 and they were asked to 
choose the preferred one. A total of 39 subjects participated to this experiment: 
26 males and 13 females. 

Results of Experiment II are reported in Table 7. In three datasets (Burst, 
Landscape, and Self) the new collages were preferred by over 64% of the subjects. 
In particular, the Self dataset exhibits the higher percentage of preferences with 
about 72% of the subjects choosing the new collage. The Eashion dataset shows 
about 56% of preference for the new collage. Eor this dataset, the new criteria 
seem to be marginally effective. This is due to the artistic nature of the photos 
that makes them good-looking regardless of their positioning. The Zen dataset 
continues to be the most problematic to be evaluated. Substantially, the subjects 
split in half in judging the collages due to the particular nature of the photo’s 
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Number of times (#) and percentage (%) that a collage was preferred in Experiment II 


Burst 

# 

% 

Fashion 

# 

% 

Landscape 

# 

% 

Saliency 

13 

33.3 

Saliency 

17 

43.6 

Saliency 

14 

35.9 

Final collage 

26 

66.7 

Final collage 

22 

56.4 

Final collage 

25 

64.1 


Self 

# 

% 

Zen 

# 

% 

Quality 

II 

18.2 

Saliency 

19 

48.7 

Final collage 

28 

71.8 

Final collage 

20 

51.3 


content. On average 62% of the subjects preferred the new photo collages. 


8. Further experiments 

In this section further experiments are carried out to verify the generalization 
ability of the identified criteria and their learned relative importance. Three 
different experiments are presented: i) the learned definition of pleasantness is 
used to create collages on unseen image sets; ii) results obtained by our proposal 
are compared against two state-of-the-art algorithms; iii) the behavior of the 
learned definition of pleasantness is also tested by varying the number of images 
in the dataset and the canvas size. 


8,1, Generalization to new photo themes 

In order to test how the learned definition of pleasantness generalizes to photo 
collages not seen during the training phase, a further experiment has been done. 
The optimal set of weights learned on the five training datasets in the previous 
section is used as-is to create photo collages on the new datasets. For this 
experiment, six new challenges have been selected from the DP Challenge web 
site. The chosen challenges are: Shallow DOF VI (Shallow for brevity). Red 
V (Red), Primary Colors II (Primary), Silhouettes VI (Silhouettes), Selfie! 
(Selfie), and 160 Pixels (Pixels). 

The experiment performed is a pairwise subjective test similar to the one 
used in Experiment II. For each dataset, users were presented with the two 
collages in Figure 8 and they were asked to choose the preferred one. Following 
the results of Experiment I, one of the collage was generated using a single 
importance map; the other one was generated maximizing the learned fitness. 
A total of 42 subjects participated to this subjective experiment: 22 males and 
20 females. 

Results from this experiment showed that on average 61% of the subjects 
preferred the photo collages created using the learned definition of pleasantness. 
In particular, in four datasets (Shallow, Primary, Selfie and Pixels) these collages 
were preferred by over 66% percent of the subjects. For the remaining two 
datasets (Red and Silhouettes) instead, the two collage versions tied. 
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Shallow Red Primary Silhouettes Selfie Pixels 
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Fig 8. Generalization to new photo themes. These eollages are ereated using the proposed 
framework with the single importanee map (top) and the same learned fitness and weights 
used to create the collages in Figure 7 (bottom). Golor and full size images can he found at 
http:// WWW. ivl. disco. unimib. it/research/collage/ . 


8,2, Comparing collages 

The different collage algorithms in the state of the art are based on different 
philosophies: keep images with the same size vs. allow image resize; allow image 
rotation vs. not; preserve image borders vs. blend image contents; allow images 
overlapping vs. not. We have run an experiment to compare our collage results 
with those of two algorithms belonging to the non-content preserving category 
(the same as ours) but using different philosophies: Shape Collage^ (a com¬ 
mercial software), and Autocollage [34]. The most relevant differences between 
the algorithms are that Shape Collage and our algorithm allow images to be 
rotated, while Autocollage does not. Moreover, Autocollage blends the images 
together to have a smooth transition between them, while Shape Collage and 
our algorithms do not. 

For this comparison, we used the six image datasets used in Section 8.1. We 
set the parameters of the Autocollage and Shape Collage algorithms to generate 
collages of 14 images on a squared canvas and image to canvas ratio as similar 
as possible as in our set-up. The collages generated with the different methods 
are shown in Figure 9. 

The same subjects that participated in the previous experiment participated 
to this one. We asked them to choose among the three collages which they 
preferred. Results from this experiment showed that on average 56% of the sub¬ 
jects preferred our photo collages; 42% of the subjects preferred the Autocollage 
results and just 2% preferred the Shape Collage results. In particular, in four 
datasets (Red, Primary, Selfie and 160 pixels) our collages were preferred by 

^http://www.shapecollage.com/ 











User Preferences Modeling and Learning for Pleasing Photo Collage Generation 


23 


Shallow Red Primary Silhouettes Selfie Pixels 



Fig 9. Comparison of our collage against Shape collage and Autocollage algorithms. 


65% percent of the subjects. For the remaining two datasets (Shallow and Sil¬ 
houettes) instead, 64% of the subjects preferred the Autocollage results. This 
is due to the nature of the images used: in both categories the images con¬ 
tain a subject with an out-of-focus (Shallow) or almost uniform (Silhouettes) 
backgrounds. This makes easier for Autocollage to nicely blend image contents. 


8,3, Varying collage sizes 

In this experiment we test how the learned definition of pleasantness generalizes 
to datasets with different number of input images per collage and different 
canvas sizes. Two smaller and two larger variants of the Red dataset have been 
considered, containing 5, 10, 25, and 50 images respectively. Canvas sizes have 
been chosen so that they are almost half of the total area covered by the images 
as in Section 3. Thus optimization has been performed on canvas having side 
length equal to 250, 350, 550, and 750. The results are reported in Figure 8.3 and 
compared with the results obtained by Autocollage, which resulted in the best 
competing algorithm in the previous section. All the canvas have been resized 
to equal size for better visualization. The results of Autocollage in the case of 
5 images is not available since the minimum number of images it can handle 
is 7. The judgments of these collages have been performed by 20 subjects. We 
asked them to choose which collage among the two they prefer. After the test, 
the results that we collected are the following. For the three collage with few 
images, the majority of the subjects chose our collage with percentages of 100%, 
70% and 65% for the 5, 10 and 15 images respectively. In the case of 25 images, 
the difference between our collage and the Auto collage is reduced, with 55% 
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5 images 10 images 14 images 25 images 50 images 



Fig 10. Generalization to datasets with different number of input images per collage (5, 10, 
14 , 25, 50) and different eanvas sizes (250, 350, 550, and 750 pixels respeetively). For eaeh 
eollage the eanvas size is almost half of the area eovered by all the images. The images have 
the same dimensions for visualization purposes. 


of the users choosing our collage and 45% choosing the Auto collage. Finally, 
the gap between the two approaches further reduces in the case of 50 images to 
practically a tie (50% of preferences). The interview with the users revealed that, 
when presented with the collages with 50 images, the limited canvas size made 
them paying less attention to the actual content of the images, while favoring 
the overall image distribution. In this case, the two collages were considered 
equally cluttered but the smooth carving of Auto collage made this collage 
more pleasing. 


9. Conclusion 

In this work we have considered the problem of creating pleasing photo collages 
by exploiting subjective experiments to model and learn user preferences. We de¬ 
signed an experimental framework for the identification of the criteria that need 
to be taken into account to generate a pleasing photo collage. Starting from 
collages created using state-of-the-art criteria, namely photo informativeness, 
canvas area coverage, and information ratio balance, we performed a subjective 
experiment involving several subjects on different thematic photo datasets. This 
experiment showed that different and more complex criteria are involved in the 
subjective definition of pleasantness. Inspired by the responses of the subjects, 
we have redefined the basic criteria and we have identified and implemented new 
global and local ones: face ratio, axis alignment, centrality, convexity, color sim¬ 
ilarity, orientation diversity and minimum orientation difference. The relative 
importance of all these criteria has been learned by exploiting user rankings. 
Moreover, with the proposed experimental framework we learned a composite 
photo informativeness description from saliency, quality and harmony. A new 
set of collages has been generated using the identified criteria and evaluated in 
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a pairwise comparison experiment against the previous best rated collages. The 
new collages were preferred by the majority of the subjects for all the photo 
datasets considered, showing that the proposed framework is able to identify 
and combine the criteria at the basis of user preference, and to learn a compu¬ 
tational model which effectively encodes an inter-user definition of pleasantness. 
A further experiment has been run, showing that the learned definition of pleas¬ 
antness generalizes well to new thematic photo datasets not used in the training 
phase. 

Photo informativeness has been described in terms of saliency, quality, and 
harmony maps, but other maps taking into account different image properties 
can be incorporated as well in our framework (e.g. photo memorability by Isola 
et al. [20, 22]). Furthermore, leveraging user preferences, the proposed frame¬ 
work permits to quantify the contribution of different visual features to model 
new intrinsic properties of the images. 

The proposed framework can benefit current collage generation algorithms in 
two different ways. The first regards its use to estimate the weights of the fitness 
function (also called energy function) in the different collage generation algo¬ 
rithms, e.g: weights associated to region importance, transition cost, object sen¬ 
sitivity and face presence in Autocollage [34]; representativeness, compactness 
and transition smoothness in Video collage [29] ; salience visibility, salience ratio 
balance, penalty of severe occlusions, blank space presence, canvas shape con¬ 
straint, spatially uniformity and orientation diversity in Picture collage [37, 26]; 
image complexity and content distinctness in [42] . All these algorithms heuristi- 
cally set the weights associated to the different terms in their fitness functions. 
With our framework, these weights can be systematically set using user prefer¬ 
ences. This way requires that a training data has to be generated in the form 
of multiple collages and the collection of user judgments about them. This op¬ 
eration has to be done only once and does not impact collage generation time. 
The second way in which existing algorithms can leverage our work regards the 
possibility of including the new criteria here defined inside their fit ness/energy 
functions. This will not dramatically slow down the collage generation process, 
since the new criteria are fast to compute. 

As future work we plan to investigate if the learned definition of pleasantness 
changes when subjects and photos are linked. We plan also to expand the set of 
criteria by enlarging the number of subjects in the experiments, and by adding 
more thematic photo datasets. 
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